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INTRODUCTION 

O ne of the most important issues in any educational environment is iden¬ 
tifying factors that promote academic success. A plethora of research on 
such factors exists across most academic fields, involving a wide range of 
student demographics, and the definition of student success varies across the 
range of studies published. While much of the research is devoted to looking 
at student performance in particular courses and concentrates on examina¬ 
tion scores and grades, many authors have directed their attention to student 
success in the context of an entire academic program; student success in this 
context usually centers on program completion or graduation and student 
retention. The analysis in this paper follows the emphasis of McKay on the 
importance of conducting repeated research on student completion of honors 
programs at different universities for different time periods. This paper uses a 
probit regression analysis as well as the logit regression analysis employed by 
McKay in order to determine predictors of student success in the honors pro¬ 
gram at a small, public university, thus attempting to answer McKay’s call for 
a greater understanding of honors students and factors influencing their suc¬ 
cess. The use of two empirical models on completion data, employing differ¬ 
ent base distributions, provides more robust statistical estimates than observed 
in similar studies 
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PREVIOUS LITERATURE 

The early years of our research was concurrent with the work of McKay, 
who studied the 2002-2005 entering honors classes at the University of North 
Florida and published his work in 2009. The development of our methodology 
was dependent on important previous work in this area. Yang and Raehsler, in 
an article published in 2005, described their use of an ordered probit model to 
show that the total score on the Scholastic Aptitude Test (SAT), the cumula¬ 
tive grade point average, and the choice of academic major significantly influ¬ 
enced expected grades in an intermediate microeconomics course. The use of 
a probit model, which differs in only underlying probability distributions, is 
mimicked in this paper, which also uses logit model analysis. 

Research in program effectiveness rather than success in a particular class 
varies across many different student cohorts. In a 2007 qualitative analysis of 
held research, for instance, Creighton outlines important factors influencing 
graduation rates among minority student populations. The study concentrates 
equally on institutional factors, personal factors, environmental factors, indi¬ 
vidual student attributes, and socio-cultural characteristics to explain differ¬ 
ences in graduation rates for underrepresented student populations. The basic 
issues in that study are complex, and unfortunately no clear empirical evi¬ 
dence is provided. Zhang et al. do provide an earlier (2002) empirical analysis 
of student success in engineering programs across nine universities for the 
years 1987 through 2000. That paper boasted a sample of 39,277 students 
and used a multiple logistic regression model to show that high school grade 
point average and mathematics scores on the Scholastic Aptitude Test (SAT) 
were positively correlated with an increase in graduation and retention rates 
among engineering students. Interestingly, verbal scores on the SAT exami¬ 
nation were negatively correlated with graduation and retention rates among 
engineering students in the longitudinal study. In 2007, Geiser and Santelices 
described expanding this work in a study of the relevance of high school GPAs 
to college GPAs among 80,000 students admitted to the University of Cali¬ 
fornia system. Using a linear regression model, they found that high school 
GPAs were consistently the strongest predictors of college grades across all 
academic disciplines and campuses in the study. They determined that this 
predictive power actually became stronger after the freshman year. 

McKay used a logit regression model to study retention in the honors pro¬ 
gram at the University of North Florida. Using a sample of 1017 students in the 
honors program from 2002 through 2005, he found that high school GPA was 
the best predictor of program completion. The study also found that gender 
was a strong predictor of student success in the honors program while SAT 
scores did not display a significant relationship with program completion. Our 
study builds on this work by employing a different model and incorporating 
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the academic discipline of each student in the analysis. We also divide the SAT 
score between math and verbal scores similar to that observed in the 2002 
Zhang et al. study. 

In more recent work published in 2013, Keller and Lacey studied student 
participation levels in the large honors program at Colorado State University 
and found that female students and students majoring in the liberal arts and 
natural sciences were more active in the program. Male students, along with 
business and engineering majors, tended to be less active in the program as 
measured by an index developed by the authors. Also in 2013, Goodstein and 
Szarek discussed program completion from an alternative view; rather than 
empirically studying factors influencing program completion, the authors out¬ 
lined common reasons why students might not complete an honors program, 
especially the need for extra time to study for professional school entrance 
examinations, an inability to find a workable thesis topic, and additional 
coursework required after adding another academic major. This area of inquiry 
is interesting as it provides a possible future line of empirical research. 

DATA 

Data for this study came from Clarion University, a public university in 
western Pennsylvania. Enrollment at Clarion University is approximately 
6,000, and the school is part of the Pennsylvania System of Higher Education, 
a collection of fourteen universities that collectively make up the largest higher 
education provider in the state of Pennsylvania (106,000 students across all 
campuses). The sample of 449 individuals used for this study includes stu¬ 
dents who were admitted to the Clarion University Honors Program for the 
years 2003 through 2013. Data for each student includes whether or not the 
student successfully completed the Honors Program (COMP), the college 
affiliation of his or her academic major (using three dummy variables named 
ARTSC for the College of Arts and Sciences, BUS for the College of Business 
Administration, and EDUC for the College of Education), the student’s gender 
(GENDER), high school grade point average (HSGPA), and both verbal and 
math SAT scores (VSAT and MS AT). The size of the entering class (SIZE) is 
also included in the analysis. Dummy variables included in the model all take 
values of either 0 or 1 and are meant to distinguish between different qualita¬ 
tive characteristics of students in the sample. The dependent variable in this 
analysis, COMP, takes on a value of 1 if the student successfully completed 
the Clarion University Honors Program and 0 otherwise. Likewise, GENDER 
is assigned a value of 1 when the student is male and a 0 when the student is 
female. ARTSC is set at 1 if the student is in the College of Arts and Sciences 
(0 otherwise), BUS is 1 if the student is in the College of Business Administra¬ 
tion (0 otherwise), and EDUC is 1 if the student is in the College of Education 
(0 otherwise). 
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Given differences in requirements and grading practices across academic 
disciplines, there is some theoretical support for including dummy variables 
on academic major (or the college of the academic major) in the analysis. 
McKay found gender and high school GPA to be significant predictors of suc¬ 
cess in honors program retention using a slightly different empirical model. 
As a consequence, we include these variables in our analysis. Table 1 below 
provides descriptive statistics for each variable in the sample. 

Descriptive statistics results show that a little over 66% of students in the 
sample completed the Clarion University Honors Program during the sample 
period. Approximately 32% in the sample are males. Academic major by col¬ 
lege affiliation of individuals in the sample breaks down to approximately 
43% in the College of Aids and Sciences, 13% in the College of Business 
Administration, and 44% in the College of Education. Students in the sample 
have an average high school GPA of 3.82 with an average SAT score (combin¬ 
ing math and verbal scores) of 1240. Since students in this sample are part of 
a university honors program, average grades and test scores far exceed simi¬ 
lar statistics for the general university student population. The SIZE variable, 
measuring the number of students in each entering class, averages nearly 42 
students per year. With an average 66% completion rate, one would anticipate 
seeing around 28 students complete the honors program each year. 

The measure of skewness provides information on how each variable is 
distributed around the mean and introduces the first statistical test in this anal¬ 
ysis. A value of zero indicates a perfectly symmetric distribution; the normal 


Table 1: Summary of Descriptive Statistics 


Variable 

Mean 

Standard 

Deviation 

Minimum 

Maximum 

Skewness 

COMP 

0.66 

0.47 

0 

1 

NA 

SIZE 

41.60 

11.65 

19 

53 

-0 73*** 

VS AT 

620 

55.95 

480 

800 

0.06 

MSAT 

621 

53.94 

490 

790 

0.06 

HSGPA 

3.82 

0.22 

2.33 

4.00 

-2.46*** 

GENDER 

0.32 

0.47 

0 

1 

NA 

ARTSC 

0.45 

0.50 

0 

1 

NA 

BUS 

0.13 

0.34 

0 

1 

NA 

EDUC 

0.42 

0.49 

0 

1 

NA 


* significant at the 0.10 level 
** significant at the 0.05 level 
*** significant at the 0.01 level 
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distribution is the classic example. A significantly negative skewness value 
suggests a long tail (or relatively few observations) in the lower part of the 
distribution. A significantly positive skewness measure suggests the reverse. 
Critical analysis of skewness statistics displayed in Table 1 will be conducted 
at the beginning of the results section below. 

RESULTS AND DISCUSSION 

Before looking at the empirical estimates of the logit and probit models 
described in the appendix, it is worthwhile to look back at basic statistics 
involving the distribution for the data set utilized. Measures of skewness do 
not appear to provide surprising results in Table 1. Entering high school GPA 
is highly skewed to the left indicating that very few students admitted have 
low GPAs. In addition to summarizing descriptive statistics for variables used 
in this study, we also need to look at how the measures are correlated with 
each other to obtain a sense of what variables to consider in the final empiri¬ 
cal model. Table 2 displays a correlation matrix of all variables collected in 
the sample . A strong positive correlation exists between the high school GPA 
and the completion rate for the honors program. A weaker but statistically sig¬ 
nificant positive relation exists between the business student dummy variable 
and honors program completion. As a consequence, students with higher high 
school grades and who chose to be business majors have a higher probability 
of completing the honors program. No other variables are significantly cor¬ 
related with completion rate. 

Other values in the correlation matrix are interesting from a pure discus¬ 
sion standpoint and might be worthy of more detailed analysis in the future. 
For example, some gender differences occur regarding SAT performance and 
choice of academic major in this sample of honors students. Male students in 
the sample seem significantly more likely to score higher on the math portion 
of the SAT given the positive correlation between GENDER and MS AT. Some 
slight negative correlation between GENDER and VSAT suggests that female 
students are more likely to score higher on the verbal section of the SAT, but 
this relationship is not statistically significant. Likewise, male students are 
more likely to choose an academic major in the College of Aids and Sciences 
(positive correlation between GENDER and ARTSC) while females are more 
likely to choose a major in education among students in this select sample 
(negative correlation between GENDER and EDUC). High school GPA has a 
significant positive correlation with scores in the math section of the SAT in 
this sample but not with verbal scores; this is interesting given that the cor¬ 
relation matrix establishes a positive correlation between HSGPA and COMP 
and between HSGPA and MSAT but not between COMP and MS AT, seeming 
to indicate that a high GPA in high school among students qualifying for the 
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honors program helps predict completion in the program along with higher 
scores on the math section of the SAT. High scores on the math section of 
the SAT alone, however, do not help predict completion rates in the honors 
program, suggesting some inherent measure in high school grades that is not 
captured in the math portion of the SAT. Some would argue that high school 
grades incorporate a measure of effort that would positively link to comple¬ 
tion rates for any academic program. A specific empirical determination of this 
linkage remains for future study. 

Figures 1 and 2 provide an illustrative example of how completion rates 
differ across academic majors and genders in the sample used for this analy¬ 
sis. Figure 1 clearly indicates that the average completion rates among stu¬ 
dents with majors in the College of Business Administration are substantially 
higher than honors program completion rates for students in other colleges. 
Figure 2 illustrates that completion rates are somewhat higher among female 
students in the honors program than among male students in the program. 
While results across gender are similar to that seen in McKay, the results 
concerning academic majors are substantially different than those observed 
in Keller and Lacy. 

A primary drawback to relying entirely on correlation data is that the 
precise relation between program completion rate (COMP) and each of the 
explanatory variables is hidden. For example, it is difficult to predict how a 
change in the high school GPA will influence the probability of honors program 
completion without a more detailed empirical model. Clearly, the explanatory 
variables are linked, and simple correlation will not typically provide a com¬ 
plete story of how COMP is influenced by other measures in the sample. Also 
problematic is a study of correlation values when the primary variable of inter¬ 
est is qualitative (COMP takes on a value of either 0 or 1). 

The virtues of the logit and probit models have been described above, and 
in Table 3 we present maximum likelihood estimates of the latent regression in 
the most relevant logit and probit model specifications. Logit model 1 includes 
all the variables in the specification while logit model 2 includes only the most 
statistically significant explanatory variables (using a 0.10 significance level 
as a determinant). Likewise, probit model 1 and probit model 2 use the same 
model specifications for the probit model estimation procedure. In both gen¬ 
eral specifications, high school GPA is the most important predictor of honors 
program completion rates while the business college dummy variable (BUS) 
is significant at the 0.10 level. No other explanatory variables were found to 
be statistically significant. 

From a statistical standpoint, results of the latent regression estimates fit 
the data well when observing the likelihood-ratio (LR) statistic. All p-values for 
LR are well below 0.01, indicating that variations in the program completion 
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Figure 1: Completion by Academic Major 
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variable (COMP) are substantially explained by variations in the explanatory 
variable chosen in the analysis. As stated above, high school GPA and the 
business college dummy variables are most significant. The positive sign on 
the coefficient for HSGPA indicates that a higher high school GPA predicts 
a higher probability of honors program completion. Likewise, the positive 
sign of BUS suggests that students with majors in the College of Business 
Administration are more likely to complete the program than students with 
majors in other colleges. While SAT scores are used to screen students wishing 
to enter the honors program, they do not help predict completion rate prob¬ 
abilities in the program. Gender is also not a significant predictor of program 
completion. 

For more precision, marginal effects of each variable on COMP using the 
logit and probit model estimates need to be calculated. Estimates above for the 
latent regression equations do not incorporate the non-linear nature of prob¬ 
ability. Using the cumulative exponential and normal distributions, marginal 
effects are calculated for each of the four specifications presented in Table 
3. Empirical results matching the marginal effects on program completion 
(COMP) with each change in explanatory variable are presented in Table 4. 

The variables that matter the most in Table 4 are high school GPA and 
the business school dummy variable, so the logit model 2 and probit model 2 
are the primary specifications to consider. Results are provided for changes in 
the high school GPA, including an increase of 0.2, an increase of 0.5, and an 
increase of 1.0. Results for the logit model specification show that an increase 
of HSGPA by 0.2 leads to an increase in COMP of 0.067, or a 6.7% increase 
in the probability of program completion. The probit model specification pro¬ 
vides a similar estimate of a 6.8 percent increase for the same grade point 
interval. When the high school GPA is 0.5 higher, the program completion 
rates increase by 14.9% and 15.4% when using the logit and probit model 
estimates respectively. A full increase of 1.0 points in the HSGPA variable 
increases the probability of completion by 24.0% and 25.2% for logit and 
probit model specifications respectively. Clearly a student’s high school GPA 
can effectively predict completion outcomes in the honors program. 

For the business college dummy variable (BUS), a value of 0 means that 
the student is not in the business college while a value of 1 means the student 
does have an academic major within the business college. The 0.111 estimate 
using logit model 2 means that, all else being equal, a student deciding to 
select a major in the business college typically displays an 11.1% higher com¬ 
pletion rate than students with majors outside the college. The estimate using 
probit model 2 provides an identical 11.1 percent increase. This shows that the 
academic major selection with respect to the College of Business Administra¬ 
tion does make a difference on predicted completion rates. 
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Remaining variables in the analysis are displayed in logit model 1 and 
probit model 1. Since results are nearly identical, a cursory analysis can be 
made by just looking at the probit model results. Female students, for example, 
have a completion rate that is approximately three percent higher than males 
in the sample. An increase in verbal SAT score by 100 predicts a 0.1% higher 
completion rate while a 100-point increase in the math SAT score predicts a 
0.9% increase in completion. Both results are relatively small when compared 
to high school GPA results. Higher class size by an increment of ten and the 
choice to select an academic major in the College of Arts and Sciences lead to 
decreased predicted completion rates by 1.5% and 1.4% respectively. Again, 
these results are not statistically significant. 

CONCLUSION 

This study serves as an important addition to the existing literature in 
that it provides some empirical support for previous work with some inter¬ 
esting variations. As McKay observed, we find that the high school GPA for 


Table 3: Logit and Probit Model Equation Estimates 


Variable or 
Measure 

Logit 
Model 1 

Logit 
Model 2 

Probit 
Model 1 

Probit 
Model 2 

CONSTANT 

-5.82 

(.006) 

-5.58 

(.007) 

-3.53 

(.006) 

-3.38 

(.000) 

GENDER 

-0.14 

(.567) 


-0.08 

(.569) 


SIZE (xlO 2 ) 

-0.71 

(.427) 


-0.41 

(.456) 


VS AT (xlO 5 ) 

5.53 

(.977) 


4.78 

(.967) 


MS AT (xlO 3 ) 

1.13 

(.582) 


0.70 

(.572) 


HSGPA 

1.58 

(.000) 

1.61 

(.000) 

0.95 

(.000) 

0.98 

(.000) 

ARTSC 

-0.06 

(.783) 


-0.04 

(.774) 


BUS 

0.53 

(.133) 

0.54 

(.099) 

0.32 

(.130) 

0.33 

(.091) 

LR STATISTIC 

22.82 

(.002) 

19.67 

(.000) 

22.86 

(.002) 

21.66 

(.000) 


p-values are in parentheses 
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students in the honors program emerges as the most significant predictor of 
program completion. The fact that SAT scores do not significantly help predict 
expected completion rates suggests that high school GPAs may include mea¬ 
sures beyond the basic knowledge indicated in standardized tests. A paradox is 
generated in that both high school GPAs and SAT scores are used to determine 
whether entering students qualify for the Clarion University Honors Program. 
One explanation is that, while SAT scores provide a basis for determining 
academic potential, high school GPAs include an individual’s overall work 
ethic and effort. We read of students who underperform in high school yet 
score high on standardized tests. These types of students, as predicted by this 
analysis, would not be as likely to complete the honors program using the 
same level of effort in college. An empirical establishment of what GPA mea¬ 
sures would be an interesting extension of this analysis. One possible policy 
implication of this result is that, if a program or college in honors wishes to 
increase completion or participation rate, a director or dean should target for 
special scrutiny those individuals coming in with below-average high school 
GPAs as they are more likely to drop the program. 

Results in this analysis showing that business college students are more 
likely than students in the arts and sciences or in education to complete the 
honors program are different from previous studies. The overall discussion 
in Goodstein and Szarek may support these findings. Most students from the 
Clarion University College of Arts and Sciences are natural science majors, 


Table 4: Marginal Probability Effects on Completion Probability for 
Logit and Probit Models 


Marginal Change 

Logit 
Model 1 

Logit 
Model 2 

Probit 
Model 1 

Probit 
Model 2 

GENDER 0 to 1 

-0.030 


-0.030 


SIZE increase by 10 

-0.016 


-0.015 


VS AT increase by 50 

+0.000 


+0.001 


VS AT increase by 100 

+0.001 


+0.001 


MSAT increase by 50 

+0.009 


+0.009 


MSAT increase by 100 

+0.024 


+0.025 


HSGPA increase by 0.2 

+0.065 

+0.067 

+0.066 

+0.068 

HSGPA increase by 0.5 

+0.147 

+0.149 

+0.150 

+0.154 

HSGPA increase by 1.0 

+0.237 

+0.240 

+0.248 

+0.252 

ARTSC 0 to 1 

-0.014 


-0.014 


BUS 0 to 1 

+0.109 

+0.111 

+0.108 

+0.111 


Spring/Summer 2014 


125 


















An Empirical Analysis of Factors Affecting Completion Rates 


typically in biology and physics. Most of these students study for professional 
(especially medical) or graduate school exams, and the prospect of working 
on a thesis at the same time can be daunting. Likewise, students in our college 
of education are busy with student teaching, which takes time away from the 
senior project. Business students do not consistently face these obstacles, so 
they may remain in the program, but additional work needs to be done to see if 
this is the case. Future analysis will attempt to determine how completion rates 
are influenced by student involvement and whether differences exist among an 
expanded demographic of students enrolled in the program. 
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APPENDIX 

Because of the discrete nature of the dependent variable in this study (COMP 
takes on a value of either 0 or 1), ordinary least squares regression would 
be an inappropriate model. The two most common models utilized when the 
dependent variable is discrete and binary are the logit and the probit models. 
The logit model utilizes the logistic or exponential function and is the model 
of choice in McKay (2009). The probit model utilizes the standard normal 
distribution in developing probabilities and is the additional method utilized 
in this analysis. The underlying standard normal distribution allows for a more 
uniform probability of obtaining a 0 or a 1 when compared to the exponential 
function, however, both models tend to provide similar results for relatively 
small changes in the independent variables. It is beneficial to report results 
from both the logit and probit estimation procedures in order to observe any 
possible variation in results. If the empirical results show a great deal of varia¬ 
tion, the model specification would be placed in question as it is dependent on 
the assumed distribution of the dependent variable. On the other hand, if the 
marginal impacts of changes in each variable on the probability of program 
completion among honors students are consistent, a robust quantitative esti¬ 
mate is verified. 

The standard binary logit or probit model is widely used for this dependent 
variable type and is built around a latent regression of the following form: 

(1) y = x'p + e 

where x and p are standard variable and parameter matrices, and e is a vector 
matrix of normally distributed error terms. The initial model considered for 
the latent regression can be formulated as: 

(2) y. = |3 0 + p, GENDER + p, VS AT + p 3 MS AT + p 4 HSGPA. + p 5 ARTSC. 

+ P 5 bus. 

The dummy variable EDUC is not included in the latent regression model in 
order to avoid the dummy variable trap. For convenience, rather that writ¬ 
ing out the entire latent regression formula, the equation above can also be 
written as: 

(3) y. = P'x 

In both equation (2) and equation (3) the variable y is the COMP variable 
equal to 0 if student i did not finish the Clarion University Honors Program 
and 1 if that student did successfully complete the program. For the probit 
model, the probability that y=l can be calculated as 

(4) = cp(P’x) 
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where cj) is the standard normal distribution function and cp is the cumulative 
standard normal distribution function. For the logit function, the same prob¬ 
ability would be 

(5) e p ’7(l+e p ' x ) 

for each value of x. With a fair amount of calculation, the coefficients on a 
binary logit or probit model can be easily interpreted. Rather than treating the 
slope parameters in a linear fashion, the marginal effect of each explanatory 
variable can be calculated using the cumulative standard normal distribution 
in the case of the probit model or the cumulative exponential function for logit 
analysis. Using the notation above, the marginal effect of variable x on the 
dependent variable (y or COMP in this analysis), can be calculated using the 
following equation for the probit analysis: 

(6) dE(ylx) / dx. = [dF(p'x)/d(p'x)]xp. = A0>(p'x)|3. 

where A represents the change in the cumulative logistic distribution when 
x. is changed. Analysis of the marginal effect of each explanatory variable 
provides a better empirical description of how each variable influences the 
probability of a student completing the Clarion University Honors Program 
given the value of all other explanatory variables. Parameters for the probit 
model are attained using standard maximum likelihood estimation. Simply 
put, the marginal effects of any variable in a probit model are determined by 
calculating the change observed in the cumulative normal distribution when 
the variable in question incrementally changes. 

Likewise, marginal values for the logit model are obtained from the 
following: 

(7) dE(ylx) / dx. = [dF(p'x)/d(p'x)]xp. = A(l/(l+e z P x )) 

Maximum likelihood estimates are calculated in a similar fashion for the logit 
model. Comparative statics for each variable can be done to determine how 
each measure affects the probability students will complete the Honors Pro¬ 
gram. Again, it is important to use both logit and probit analyses since each 
assumes a different base distribution in calculating probabilities. As with the 
probit model, the marginal changes are calculated by looking at changes in the 
cumulative exponential function due to changes in the variable of interest. 
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